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ABSTRACT 

This report proposes a method for achieving 
improvements in the precision of determining Fleet Replacement 
Squadron (FRS) student aviator proficiency. The proposed mathcd, 
called the Computer Aided Training Evaluation and Scheduling fcari- 
system, provides a computer managed, prescriptive training program 
based on individual student performance. Designed to formalize and 
quantify the parameters of the decision process used to determine 
proficiency, the CA'iES system (1) clearly defines the level of skil_n: 
required of the FRS graduate; (2) adds precision to instructor pilov; 
judgments by providing a more clearly defined comparison standard: 
(3) increases reliability of instructor pilot judgments by grading 
each task execution rather than using the instructor's subjective 
judgment; (4) lists tasks for individual students depending upon 
whether proficiency has been attained, has not been attaina~, or has 
not been determined; and (5) provides an acceptable and workable 
performance assessment schema for use in a computer-managed 
instruction system. Twelve references are listed, and a mathematical 
discussion of the Wald Binomial Probability Ratio Test is appended. 
(Author/LLS) 
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secti: I 



INTRODUCTION 



A major pre:.: em in the Fleet Replacement Squz.: 
the appropriate anount of in-flight trainir g that : 
trainee to meet :r ; e objectives of the FRS i servi 
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to achieve precise determination of pilot 
curate witn squacron goals has hampered th* 
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tiveness and efficiency). 
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Improving precision in judging pilot proficiev- 
ment ability in prescribing training sensitive to ri\tr 
student performance and instructor evaluation cor.-.lnij e 
ment in military flight training. 

This report proposes a method for achieving 
of proficiency judgments and in determining stude 
posed solution, identified as the Computer Aided 
Scheduling (CATES) system, provides a computer mam 
program based on individual student performance. ';, 
system emphasizes the following: 

• clearly defines the level of skills req..s 

• adds precision to instructor pilot judo 
clearly defined comparison standard 

• increases reliability o* instructor pi 1 
task execution rather than using the in- 
average" of all task executions 



.0 ° r ancing manage- 
differences in 
to be a prime require- 



ierts in the precision 
fcier-cy. This pro- 
■ f < t».iluation and 
P' -scriptive training 
ic-'ce, the CATES 



of '. ;e FRS graduate 
by providing a more 



igments by grading each 
•or pilot's "subjective 



lists tasks for individual students ind - ?g one of the follcwi 
decisions: 

desired proficiency attained 

proficiency below acceptable limits, o 

proficiency undetermined, continue trai ring/practice. 

provides an acceptable and workable performance assessment schema 
for use in a Computer Managed Instruction (GfcT) system. 
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'- : e P° r * descnb, : the problems encountered in attempts to determine 
rrent task performance of studerts and the conceptual development of 
- is system as a merhod tha~ ma: be used in making proficiency determi- 
■•s. An effort is in progress tc xsl:? the ope~3tional feasibility of 
> -S system and to evaluate the /a' ty of p-ficiency determinations 
b, -he system. Resu ts of this affe- will be resented in a future 
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.;^ition to this ntroduction, : -e <* ... ns and one eppemtlx 

. rsented. Section I * ■• • 
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rffsrts of instructor p 



presents the Tethod ss-gned to strengthen 
n the criteria contir t ~o be based on subjective 
lots, by clanfyrm asfcr to be measured and 



'T^JErJ f anda r d or wMch t0 base **3 «^ judgments the cH?eria 
j -eflect a greater precision. 

Section III presents the method for f-^iiztng and quantifying the 
meters of the proficiency determination process. 

Section IV presents preimplementatior considerations of the GATES 
•am and its applicability at a specific TO. Issues to be tested ss well 
xure implications of the CATES syster are diseased. 
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SECTION II 
IMPROVEMENT OF GRADING- " 3EDURES 

CUR--" PRACTICE 

armination of le provident perform:- - of aircraft ying tasks 
corr. vj5 -o be a subi-ati'vp judgment made b -structor pile .. Current 
pracrrze i training scuadrons consists of "-' icrh.s" durirq p- zh a subset of 
taskr m the trainirg syllabus are perform a varing f\-jr:,z~ of times by 
the ~"lc rainee at tne discretion of the ir.-tructor pi lots . During or 
shor y ~zar each flight, the instructor pilot "grades" the --lot trainee on 
the t=?«.v erformed using a standard scale bur lIso employing — is own personal 
crits-. .. While instructors differ in their personal rating b-as (nard- 
easy.:,_ :he attempt to grade in terms of "average performance =t this stage 
of t-ai ;nc." It is usual for the pilot trainee to be exposec to several 
different -nstructor pilots. After a specified minimum number of flights, 
and a recomnendation by an instructor pilot, the pilot trainee is scheduled 
for a f'-a" "check flight." His performance on selected task is graded by 
an ins: : :or pilot :.cting in the independent role of "check Hot." Should 
the pi" Trainee not perform the flight consonant with the s~andards of 
perfor 5 expected of him by the "check pilot," he is rescheduled for 
addit "check flights" until he is deemed proficient. 

S dent exposure to training tasks can be variable due to instructor 
differ :es and varying performance standards. In addition, each individual 
pilot irainee exhibits variability in successive performances on complex 
procec-iral and psychomotor tasks. This variability of skilled task perfor- 
mance >-s been well documented (Fitts and Posner, 1968). Further compounding 
this problem of inconsistent performance, the pilot trainee is transitioning 
from c. level of performance well below the required level to a required 
standard of performance. This transition reflects different learning rates 
by the individual pilot trainees. Learning rates are also highly variable 
within and between individuals (Sidman, 1960). It is quite obvious that 
determination of asymptotic performance commensurate with desired performance 
standards is difficult to ascertain using the current practice. 

PROFICIENCY GRADING SYSTEM 

In a series of studies conducted by the Training Analysis and Evaluation 
Group (TAEG) to determine the effectiveness of Device 2F87F (P-3 Operational 
Flight Trainer) in the FRS, the inadequacies of current grading procedures 
were recognized (Browning, Ryan, Scott, and Smode, 1977; Browning, Ryan, and 
Scott, 1978). To overcome these inadequacies, the TAEG instituted a "profi- 
ciency grading system." The system provided a clearer picture of the trainee's 
flight task performance in both simulator and aircraft training. The pro- 
ficiency grading system still reqjired a subjective judgment by instructor 
and check pilots. However, the instructors graded task performance against a 
precise standard: "P was defined as performance estimated to be equivalent 
to that required to demonstrate ompetence in that task on the conventional 
FLY 6 check" (Browning, et al., 1377, p. 20). This standard focuses on the 
required terminal level of perfornance; i.e., the objective of training. 
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Actual grading of performance was accomplished using a dichotomous scale. 
Task performance that met or exceedec the standard was recorded as "P"; task 
performance that did not meet the stcnda-d was recorded as "1." The profic- 
iency grading introduced by the TAEG "iad a further requ-'rement. Performance 
was graded each time the task was per^omed and this seHes of graded trials 
was recorded and kept in the sequence of pres?ntation. The procedure of 
grading each task trial as it was per—a-ned eliminated -.he requirement for 
the instructor to make a summary judgn»"t of task proficiency based on pilot 
trainee performance of successive tas.: rials during = - 14 ght. 

The advantages of a proficiency 37 uing system for increasing the pre- 
cision of performance judgments have Dean -ncorporated in the CATES system. 
The performance standard used in the :/ ~ES system is defined as task perfor- 
mance estimated to be equivalent to :na: required to earn an adjective rating 
of "Qualified" and/or a numerical sere of 4 on the Naval Air Training and 
Operating Procedures Standardization (KATOPS) Program flight evaluation. The 
CATES system uses the same proficiency grading procedure as discussed pre- 
viously. Although the grading procedure increases the precision, it does not 
reduce several sources of variabili / in trainee performance; e.g., task dif- 
ficulty and learning rates. 

The proficiency grading procecjre results in a task performance or 
training protocol for each task. ;o hypothetical trainee records (protocols 
from the same trainee) are shown ir table I. 



TABLE 1. 


HYPOTHETICAL TASK 
FOR TWO DIFFERENT 


PERFORMANCE C r ONE TRAINEE 
TASKS 


Task 




Training Prolrcol 


Task A 




1 iHPIPPPIPF 


Task B 




1PPPPPPPPFPP 



It could be inferred that "Task A" is more dfficult than "Task B" or it 
could be inferred that the trainee is more proficient on "Task B" than "Task 
A." 

Table 2 contains examples of trainee task performance protocols for two 
different kinds of tasks and hypothetical task protocols for a trained pilot. 
The pilot trainees exhibit different protocols initially (more "Ts" than 
"P's") but the variability eventually will diminish. Learning rates differ 
among tasks as shown by comparing Task A with Task B. During later flights/ 
sessions the protocols for the pilot trainee are not readily distinguishable 
from those of a trained pilot. A procedural problem remains in determining 
when task performance protocols for trainees matched the protocols of trained 
pilots. 
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"ABLE 2. COMPARISON OF HYPOTHETICAL TASK PERFORMANCE PROTOCOLS FOR 
TWO DIFFERENT TASKS AND TWO LEVELS OF AVIATOR PROFICIENCY 



1 

j 

issk/Aviator 


Training Procotol During Flights/Sessions 






One 


Two 


Three 


Four 


Five 


Six 
















Pilot Trainee 


111 


Pll 


IP! 


1PP 


PPP 


PP 


Trained Pilot 


PPP1 


PPP 


1PP 


P 


PI 


PPP 


"TASK B 














1 Pilot Trainee 


11 


IP 


P 


PPP 


PP 


PP 


' Trained Pilot 


PP 


PPP 


PI 


PP 


P 


PP 



The essence of the problem lies in assessing, with a specified degvee of 
confidence, the point at which proficiency has been obtained. 

Several ways to deal with the problem were explored. Two approaches 
were found in previous research concerned with proficiency assessment. The 
first approach was to arbitrarily define the point at which proficiency was 
attained by the following rule: 

(1) over 50 percent of the trials (for a given 
task) on any flight had to be "P" and. (2) at 
least 50 percent of the trials were P on all 
subsequent flighty (Browning, et al., 1978, 
p. 23). 

The second approach was used in the evaluation of the Initial Entry Rotary 
Wing Flight Training Program by the Army (USAAVNC Evaluation Team, 1979). 
The tasks were graded by daily performance rather than by individual trials; 
however, the approach used to determine proficiency could also be incorporated 
with graded trials. 

The point of principal concern was the training 
day on which the student achieved proficiency 
on each maneuver. Achievement of maneuver pro- 
ficiency was defined as that training day on 
which the third successive (+) grade on the 
maneuver was given the student. That is, the 
student was required to perform a maneuver in 
accord with established USAAVNC standards on 
three successive occasions before he was judged 
to be proficient on that maneuver (USAAVNC 
Evaluation Team, 1979, p. 21). 

7 
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While both of the above approaches are logical, objective, and expedient, 
they are faulty. Both require training protocols that include initial and 
final levels of proficiency to make accurate performance determinations. In 
other words, they are "after the fact 1 ' rather than predictive. Another flaw 
is that an arbitrary number of "P" trials is not realistic across all tasks 
due to differences in task difficulty. In addition, these approaches may not 
accommodate situations where only a small number of training trials are given 
or where there are wide differences in learning rates of trainees. Finally, 
the instructor's judgment may be biased if he has knowledge of an arbitrary 
decision rule. 

SEQUENTIAL METHOD. Both of the above approaches require a sample of trials 
of trainee performance before the rule can be applied. An alternate approach 
would be to examine trials taken one at a time and accumulate the information 
for input into the decision model (Hoel, 1971). Using this approach, one 
would expect to be in a better position to make decisions than if no attempt 
were made to look at the data until a sample of fixed size had been taken. 

There are methods available, using sequential sampling techniques and a 
statistical decision model, that operate on this accumulation of information 
basis and that require considerably less sampling on the average than the 
fixed-size sample methods. The statistical decision model is limited to two 
choices in decision making (three choices if one considers deferring a 
decision as a decision). This limitation is not troublesome when applied to 
proficiency determination. The decisions of primary concern are simply: Is 
the trainee proficient? or, alternatively, Is the trainee not proficient? 
Additional advantages are: (1) Decisions are reached based on a minimum 
number of trials and (2) Decisions are made with an established level of 
confidence. 
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SECTION III 



CATES DECISION MODEL 



,w,-,- ne se 9"? nt1a ! method m y be used as a means for making statistical 
decisions with a minimum sample was introduced by Wald (1947) Probability 
sSiuf?™? and t c ?^«P ondlrl 9 sequential procedures were developed for several 
till I d,s ? r ; but !°" s ; One of the tests, the binomial probability ratio 
±k Was f0 !™ la : ed ™ the context of a sampling procedure to determine 
whether a collection of a manufactured product should be rejected b™ause the 
f™P°rtion of defectives is too high or should be accepted because the propor- 
tion of defectives is below an acceptable level. The sequential testing 

HJSta" ™Vti P0 5 t *""* ent of decisions Sr ng c tance 

(a)lnd bT><& 'ifShTft i*"V°" ,s based on Prescribed values of alpha 
t is^&'f'ype P e?ror Z% ?«Mr ?! deClar1n9 smeth1n 3 " T ™ e " « he " 
"False" whin it T* '"True" (TypelferSr) !"** ^ ° f declar1ng someth1n 9 

r mss oTdS ?fac a c^e th * s 

^Trs^^ 
weVelne^™ 

occurs after the manufacturing proc ss I„ ,U ? 'educ tfo al^"?' 
applications cited above (Ferguson, y%9 and fcli sc f «0 MSa, 9 
sampling occurred after the learning period In thp Vjtk . J!™ 2 
t,al sampling occuTFlurina the l«XZlod^MW te^atefiT 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 I 

NUMBER OF ITEMS SAMPLED ' 



Figure 1. Hypothetical Sequential Sampling Chart 
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CATES SYSTEM MODEL PARAMETERS 

The decision model can be described as consisting of decision boundaries. 
Referring to figure 1, the parallel lines represent those decision boundaries. 
Crossing the upper line, or boundary, results in a decision to "Reject Lot"; 
crossing the lower line, or boundary, results in a decision to "Accept Lot." 
In the CATES system, these decision boundaries translate to "Proficient" and 
"Not Proficient." Calculations of the decision boundaries require four 
parameters. These four parameters are: 

p 

1 Lowest acceptable proportion of proficient trials (P) required 

to pass the NATOPS flight evaluation with a grade of "Quali- 
fied." Passage of the NATOPS flight evaluation is required to 
be considered a trained aviator in an operational (fleet) 
squadron. 



2 Acceptable proportion of proficient trials (P) that represent 

desirable performance on the NATOPS flight evaluation. 

Alpha (a) The probability of making a TYPE I decision error (deciding a 
student is proficient when in fact he is not proficient). 

Beta (J3) The probability of making a TYPE II decision error (deciding 
a student is not proficient when in fact he is proficient). 

Parameter setting is a crucial element in the development of the 
sequential sampling decision model. Kalisch (1980) outlines three methods 
for selecting proficient/not proficient performance (q 0 /q-, values) as: 

Method 1— External Criterion . Individuals are 
classified as masters, non-masters, or unknown 
on the basis of performance on criteria directly 
related to the instructional objectives. These 
criteria can be in terms of demonstrated levels 
of proficiency either on the job or in a train- 
ing environment. The mean proportion of items 
answered correctly by the masters on an objec- 
tive would provide an estimate for q f . Similarly, 
q, would be the proportion correct for the 
nin-masters. 

Method 2— Rationalization . Experts in the subject 
area who understand the relation of the training 
objectives to the end result; e.g., on-the-job 
performance, select the q Q and q, values to 
reflect their estimation of the necessary levels 
of performance. This method is probably the 
closest to that now used by the Air Force. The 
procedure may provide somewhat easier decision 
making since specifying two values creates an 
indecision zone— neither mastery nor non-mastery. 
This indecision zone indicates that performance 
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is at a level which may not be mastery but is 
not sufficiently poor to be considered at a non- 
mastery level . 

Method 3—Representative Sample . The scores of 
prior trainees, who demonstrate the entire range 
from extremely poor to exemplary performance 
on objectives, are used to estimate q Q and q 1 . 
The proportion correct for the entire sample is 
used to obtain an initial cutting score C. Scores 
are separated into two categories: (a) those 
scores greater than or equal to C and (b) those 
less than C. For each category, the mean pro- 
portion correct score is computed. The mean for 
thp first category equals q Q ; the mean for the 
second category equals q^ . 

Selection of values for P, and Pp (P. = q 1 and P 2 = q Q in Kalisch, 1980) for 
the CATES decision model incorporated Method 1 for setting of P 1 and Method 3 
for setting of ?^ 

The value selected for P^ was based on the lowest proportion of P grades 
(numerical grade pf 4.0 on thi NATOPS flight evaluation) that may be given 
and still result in an overall rating of "Qualified. 11 The NATOPS evaluation 
flight consists of a number of flight tasks grouped in areas and subareas. 
As the tasks or subareas are performed, the pilot's performance is graded 
using a numerical score. Three numerical scores may be awarded: Qualified 
performance is assigned a "4," Conditionally Qualified performance is assigned 
a "2," and Unqualified performance is assigned a "0." The numerical scores 
are averaged across all tasks and subareas to yield an overall numerical 
score. To receive an overall rating of "Qualified," the average of all tasks 
or subareas must fall within the range of 3.00 to 4.00. Thus, the criteria 
for passing the NATOPS flight evaluation with a "Qualified" rating require 
that at least 50 percent of the tasks b* graded as "Qualified." Therefore, 
the lower limit of proficient performance was set at .50 for all tasks. 

The value selected for P 2 was determined by examining performance scores 
of a sample of 49 Naval Aviators 1 NATOPS flight evaluations qiven at Heli- 
copter Antisubmarine Squadron (HS-1), Naval Air Station (NAS) Jacksonville, 
Florida. The sample was restricted to only those aviators rated as 
"Qualified," thus representing exemplary performance. This examination 
revealed the proportion of "Qualified" scores for each subarea and/or 
flight task. This proportion is directly translated to P 2 values for each 
task in the training syllabus. 

The selection of alpha (<*) and beta (/?) should be based on the criti- 
cality of accurate proficiency decisions. Small values of alpha (a) and beta 
(J3) require additional task trials to make decisions with greater confidence. 
Factors that are important in selecting values for alpha (a) and beta (/S) are 
outlined below: 

P 13 
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1. Alpha (a) values 

a. Safety— potential harm to the trainee or to others 
due to the trainee's actual non-mastery of the task. 

b. Prerequisite in Instruction— potential problems 

in future instruction, especially if the task is pre- 
requisite to other tasks. 

c. Time/Cost— potential loss or destruction of equipment 
either in training or upon fleet assignment. 

d. Trainee's View of the Training— potential negative 
viev by trainee when classified as proficient although 
the trainee lacks confidence in that decision. Also, 
after fleet assignment if previous training has not 
prepared him sufficiently the trainee may also have a 
negative view of the training program. 

2. Beta Gc?) values 

a. Instruction— requirement for additional training 
resources (personnel and materials) for unnecessary 
training in case of misclassification as not proficient. 

b. Trainee Attitudes— the attitude of trainees when tasks 
have been mastered yet training continues; trainee 
frustration; corresponding impact on performance in the 
remainder of the training program and fleet assignment. 

c. Cost/Time— the additional cost and time required 
for additional training that is not really needed. 

Alpha (a) and beta (J3) values used in the CATES decison model were 
arbitrarily selected as .10. A confidence level of 90 percent in decisions 
made by the model appears reasonable when the previously discussed factors 
are considered. As rigorous field testing of the model is conducted, these 
parameters msy be modified as indicated by empirical evidence and command 
Dolicy. At present, values of .10 appear quite reasonable. 

After the model parameters ha/e been selected, calculation of the 
iecision boundaries may be accomplished using the Wald Binomial Probability 
Ratio Test, she appendix provides a formal mathematical discussion of this 
test. 

To illustrate the differences in task difficulty, two tasks were selected 
from the HS-1 training syllabus, and the decision models for these tasks 
were calculated. To further show how the decision models serve to aid in 
making proficiency decisions, task protocols of a pilot trainee are imposed 
on the model. 1 Figure 2 shows the model for the task "Running Takeoff," 
and figure 3 shows the model for the task "Free Stream Recovery." 

1 Actual trial data for a pilot trainee undergoing training at HS-1, NAS 
Jacksonville, FL. 

< ,l 3 




Figure 2. Sequential Sampling Decision Model for Running Takeoff Task 
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Figure 3. Sequential Sampling Decision Model for Freestream Recovery Task 
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Empirical data reflect a relative difference in task difficulty. The 
sample of NATOPS evaluation scores indicates the proportion of "Qualified" 
scores on the Running Takeoff task was .92, while the proportion of "Quali- 
fied" scores on the Free Stream Recovery task was .77. This relative differ- 
ence in task difficulty is represented in the model as differences between 
the slopes and the widths between the parallel lines of the two models. In 
the case of the Free Stream Recovery task (figure 3), the slopes are less 
steep (indicating more trials to reach proficiency) and the parallel lines 
are farther apart (indicating there will typically be more uncertainty about 
individual trials before a decision can be reached). 

In these examples, the probability of making decision errors (both type 
I and type II) as indicated earlier was set at .10 for both tasks. If this 
level of confidence was increased (lower values of alpha (a) and beta 09)), 
the region of uncertainty would also increase. The overall result is that 
more trials are required to make a decision with increased confidence. 

Both models, then, reflect rather well the true state of affairs between 
different tasks and their impact on a rational decision process. The differ- 
ences in task difficulty relate directly to differences in the model parameters. 

Figures 2 and 3 also show the decisions reached by the model on student 
performance. The student received a total of eight trials on the Running 
Takeoff task during the training program. The sequence of graded trials and 
the graphical plots of the sequence are shown in figure 2, The first two 
trials were judged to be below the standard of performance. On the second 
trial the decision model indicated the student was "Not Proficient" and 
logically should be given remedial or additional training. The sequence is 
initiated again on trial three, and on the fourth trial of that sequence 
(sixth trial given) the model decision was "Proficient." 

Figure 3 shows the protocol for the Free Stream Recovery task. Perhaps 
because of slower acquisition of a more difficult task, two decisions were 
made declaring the student "Not Proficient" in the earlier sessions of task 
exposure. The model does show that more task trials were required before a 
decision could be made about proficiency. This can be attributed to increased 
task difficulty and variability of performance. 
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SECTION IV 

PLANNING FOR IMPLEMENTATION 

The role of sequential sampling decision models to determine aviation 
task proficiency must be operationally explored in terms of feasibility and 
subsequent validity. A study is currently underway to test the concept at 
the East Coast SH-3 FRS, HS-1 , NAS, Jacksonville, Florida. The study is 
broadly planned as follows: 

1. identify a syllabus of specific training tasks 

2. establish proficiency decision model parameters from prior data 
collected at HS-1 

3. train instructors to render performance judgments on task trials; 
i.e., was performance a "1" or a "P"? 

4. collect data on each trainee's task performance by trial 

a. The current decision model (unique to each instructor) 
will determine when to terminate training the task. 

b. Instructors and training managers will have no knowledge of 
CATES system decisions regarding task proficiency. 

5. compare analytically the models using final performance criterion 
(NATOPS flight evaluation performance). 

6. make recommendations as to feasibility. 

Assuming the results of the study are promising, it will be desirable to 
look toward incorporating or designing a CMI system for which these models 
are readily amenable. Semple, Cotton, and Sullivan (1980) have summarized 
the advantages of a CMI system for aircrew training devices applicable to 
all aspects of aircraft flight training. CMI systems compare a student's 
training history with a standard training syllabus made up of lists of clearly 
defined tasks. The "ideal 11 system assesses student performance on each task 
and compares this performance with criteria of acceptable performance. This 
comparison identifies tasks that the student can or still cannot perform. 
System software then composes an individualized set of instructional tasks 
that may be trained in subsequent training sessions or flights. Additional 
factors that may be considered in system design include training asset avail- - 
ability and prediction of training completion dates. 

All the virtues of a well conceived CMI system are contingent upon an 
acceptable, workable performance assessment schema. Figure 4 is a functional 
flow diagram describing the CATES system to be operationally developed and 
tested for use by HS-1. It is premature to assert whether CATES will be a 
"stand alone" system or become an integral subsystem of the Aviation Training 
Support System (ATSS) (Naval Weapons Center, 1978). In either event, imple- 
menting the proficiency determination concept advanced in this report can 
.only be done efficiently with on-line computer support. The work of Ferguson 
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COMPUTER AIDED TRAINING EVALUATION AND SCHEDULING SYSTEM 
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Figure 4, Functional Flow Diagram of CATES System 
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(1969) and Kalisch (1980) would have been virtually impossible without on- 
line computer support. Also planned are future efforts to determine the 
range of applicability to other FRS settings. 

POST NOTE 

In summary, this report has shown the variability of flight task per- 
formance and the difficulty encountered in making accurate proficiency deter- 
minations. The CATES system has been introduced as a method to formalize and 
quantify the parameters of the decision process used in making these deter- 
minations, thereby achieving a measure of control. Effort is underway to 
operationally test the CATES system concerning feasibility, validity, and 
range of applicability. This report is a prelude to that effort. 
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WALD BINOMIAL PROBABILITY RATIO TEST 



The Wald binomial probability ratio test was developed by Wald (1947) a 
a means of making statistical decisions using as limited a sample as possibl 
The procedure involves the consideration of two hypotheses: 



H 0 : P<P, 
and H 1 : P > P 2 where 



P is the proportion of nondefectives in the collection under consideration, 
P] is the minimum proportion of nondefectives at or below which the collec- 
tion is rejected, and P2 is the desired proportion of nondefectives, at or 
above which the collection is accepted. Since a simple hypothesis is being 
tested against a simple alternative, the basis for deciding between H and 
H 1 may be tested using the likelihood ratio: 0 



P 2n _ (P 2 )dn ( ] " P 2 )n ' dn 



P ln (P/" (1- P } ) n ' dn 



Where: P. = Minimum proportion of nondefectives at or below which the 
collection is rejected. 

P 2 = Desirable proportion of nondefectives at or above which the 
collection is accepted. 

n = Total items in collection. 

dn = Total nondefectives in collection. 

The sequential testing procedure provides for a postponement region 
based on prescribed values of alpha (a) and beta (/9) that approximate the 
two types of errors found in the statistical decision process. To test the 
hypothesis H Q : P = P ] , calculate the likelihood ratio and proceed as follows 

1. if ^2n < 8 , accept H 
2 if JiH > Iz £ > accept 



a 



p 

3. if j3 < 2n . 1 -/3 , take an additional observation. 
In 

These three decisions relate well to the task proficiency problem. We 
may use the following rules: 

1. Accept the hypothesis that the grade of P is accumulated in lower 
proportions than acceptable performance would indicate. 

Oq 
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2. Reject the hypothesis that the grade of P is accumulated in lower 
proportions than acceptable performance would indicate. By rejecting this 
hypothesis, an alternative hypothesis is accepted that the grade of P is 
accumulated in proportions equal to or greater than desired performance. 

3. Continue training by taking an additional trial (s); a decision 
cannot be made with specified confidence. 

The following equations are used to calculate the decision regions of 
the sequential sampling decision model. 

logJL log UP } 

T-a TTp 
dn < + n 2 



log _^2_ + log ] - p i log p 2 + log 1_p l 

P l T" ^ 



log log W-i 

dn > °_ + n 1-P 2 



log h + log log P 2 + log UP 1 

P l ^2 T"" ^ 

Where: dn = Accumulation of trials graded as "P" in the sequence 
n = Total trials presented in the sequence 

p 

1 = Lowest acceptable proportion of proficient trials (P) required 

to pass the NATOPS flight evaluation with a grade of "Qualified." 

p 

2 = Proportion of proficient trials (P) that represent desirable 

performance on the NATOPS flight evaluation. 

Alpha(a) « The probability of making a type I error (deciding a student is 
proficient wh>n in fact he is not proficient). 

Beta(/3) = The probability of making a type II error (deciding a student 
is not proficient when in fact he is proficient). 

The first term of the two equations will determine the intercepts of the 
two linear equations The width between these intercepts is determined 

I2SnS T 1Ue ? r l6Ct f d f ° r a ! pha (a) and beta The widthTEween the 

intercepts translates into a region of uncertainty; thus as lower values of 
alpha (a) and beta (£) are selected this region of uncertainty increases. 
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The second term of the equations determines the slopes of the linear 
equation. Since the second term is the same for both equations, the result 
will be slopes with parallel lines. Values of P] and P2 as well as differences 
between P ] and P« affect the slope of the lines. This is easily translated 
into task difficulty. As P2 values increase, indicating easier tasks, the 
slope becomes more steep. This in turn results in fewer trials required in 
the sample to reach a decision. 

As differences in P] and ?2 increase, the slope also becomes steeper and 
the uncertainty region decreases. This is consonant with rational decision 
making. When the difference between the lower level of proficiency and upper 
level of proficiency is great, it is easier to determine at which proficiency 
level the pilot trainee is performing. The concept of differences in P] 
and P2 is analogous to the concept of effect size in statistically testing 
the difference between the means of two groups. In such statistical testing, 
when alpha (a) and beta 06) remain constant, the number of observations 
required to detect a significant difference may be reduced as the anticipated 
effect size increases (Kalisch, 1980). 
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